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55. (Amended) A test kit for determining if BS322 antigen or anti-BS322 
antibody is present in a test sample, said kit comprising: 

a container containing at least one BS322 polypeptide having at least 95% 
identity over the entire length of a sequence selected from the group consisting of 
SEQUENCE ID NO: 25, SEQUENCE ID NO: 26, SEQUENCE ID NO: 27, and 
SEQUENCE ID NO: 28. 

REMARKS 

The Examiner has maintained the rejection under 35 U.S.C. § 101 and 35 U.S.C. 
§ 1 12, first paragraph. Applicant respectfully disagrees. 

BS322 is a tissue specific putatine transcription factor in breast tissue found by 
SEREX (Serological analysis of recombinant tumor cDNA expression libraries). 
Applicant submits Exhibit A, ["Homo sapiens similar to breast cancer antigen NY-BR-1 
(LOC91074), mRNA"] and Exhibit B, [Jager, D.,et al y "Identification of a Tissue- 
specific Putative Transcription Factor in Breast Tissue by Serological Screening of a 
Breast Cancer Library", Cancer Research, 61:2055-2061 (2001)], to illustrate this point. 
Exhibit A shows the homology between BS322 and the molecule NY-BR-1. As 
evidenced from this Exhibit, BS322 and NY-BR-1 are the same molecule. Exhibit B 
shows that NY-BR-1 (BS322) is found in breast cancer tissues. Specifically, NY-BR-1 
(BS322) is found in 21 of 25 breast cancers but only in 2 of 82 nonnnammary tumors. 
Thus, BS322 clearly has utility as a diagnostic tool for the detection of breast cancer. 

The Examiner further rejects claims 52-61 under 35 U.S.C. § 1 12, first paragraph, 
due to the "90% identity" language in the claims. Applicant has raised the percent 
identity to 95%. 

Applicant further submits the software manual to the Wisconsin Sequence 
Analysis program, Version 8, publicly available from Genetics Computer Group, 
Madison, WI, as Exhibit C. Support for this submission is found on page 16, beginning 
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line 7. The manual provides the algorithm, parameters, parameter values and other 
information necessary to, accurately and consistently, calculate the percent identity. This 
manual indicates on pages 5-21, inter alia, that the software used the local homology 
algorithm of Smith and Waterman (Advances in Applied Mathematics 2; 482489 



The Examiner further rejects claims 62-69, 72-76 and 80 under 35 U.S.C. § 1 12 
paragraph 2 due to the "epitope" language percent in the claims. 

The methods for identifying epitopes in a novel peptide sequence are well known 
and described in both the scientific, commercial, and patent literature. For example, M. 
H. Van Regenmortel describes how to predict epitopes from the primary sequence of a 
protein. (See "Protein structure and antigenicity", Int J RadAppl Instrum B., 14(4)277- 
80, 1987.) 

Further, Perkin-Elmer Biosy stems, a major provider of DNA sequencing and 
peptide synthesizing instruments has established a public website which describes how to 
select peptides which reflect the epitopes of a protein. (See 

http://www.pebio.eom/pa/340913/html/chapt2.html#Choosing the Epitope.) This 
electronic publication was posted in 1 996 and basically describes the process employed 
by the inventors of the current patent application. 

In addition, patent application PCT/US97/00485 describes in detail how to 
identify epitopes from peptide sequences. The sequence can be scanned for 
hydrophobicity and hydrophilicity values by the method of Hopp, Prog. Clin. Biol. Res. 
172B: 367-377 (1985) or the method of Cease et al, J. Exp. Med. 164: 1779-1784 (1986) 
or the method of Spouge et al, J. Immunol. 138: 204-212 (1987). Commercial software 
programs to implement these methods are available. 



(1981)). 
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CONCLUSION 



In view of the aforementioned remarks, Applicant respectfully submits that the 
above-referenced application is now in a condition for allowance and Applicant 
respectfully requests that the Examiner withdraw all outstanding objections and rejections 
and passes the application to allowance. 
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Version with Markings to Show Changes Made 

52. (Amended) A [BS322] purified polypeptide, having at least [90%] 95% 
identity over the entire length of a sequence selected from the group consisting of [SEQ 
ID NOS: 25-28] SEQUENCE ID NO: 25, SEQUENCE ID NO: 26, SEQUENCE ID NO: 
27, and SEQUENCE ID NO: 28. 

55. (Amended) A test kit for determining if BS322 antigen or anti-BS322 
antibody is present in a test sample, said kit comprising: 

a container containing at least one BS322 polypeptide having at least 
[90%] 95% identity over the entire length of a sequence selected from the group 
consisting of [SEQ ID NOS: 24-28] SEQUENCE ID NO: 25, SEQUENCE ID NO: 26, 
SEQUENCE ID NO: 27, and SEQUENCE ID NO: 28 . 




EXHIBIT A 



RECEIVED 

MHO 9 2002 

TECH CENJER ®O/a0O 



> qi[16156644|ref |XM 035844.31 
NY-BR-1 (LOC91074), 
mRNA 
Length = 4408 



Alignments 

Homo sapiens similar to breast cancer antigen 



Score = 2266 bits (1143), Expect = 0.0 
Identities = 1143/1143 (100%) 
Strand = Plus / Plus 



Query: 1197 aggtttctcacactcatgaaaatgaaaattatctcttacatgaaaattgcatgttgaaaa 1256 

I I I M I I I I I I I I I I I I II I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
Sbjct: 3200 aggtttctcacactcatgaaaatgaaaattatctcttacatgaaaattgcatgttgaaaa 3259 



Query: 1257 aggaaattgccatgctaaaactggaaatagccacactgaaacaccaataccaggaaaagg 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 3260 aggaaattgccatgctaaaactggaaatagccacactgaaacaccaataccaggaaaagg 



1316 



3319 



Query: 1317 aaaataaatactttgaggacattaagattttaaaagaaaagaatgctgaacttcagatga 1376 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 3320 aaaataaatactttgaggacattaagattttaaaagaaaagaatgctgaacttcagatga 3379 

Query: 1377 ccctaaaactgaaagaggaatcattaactaaaagggcatctcaatatagtgggcagctta 1436 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
Sbjct: 3380 ccctaaaactgaaagaggaatcattaactaaaagggcatctcaatatagtgggcagctta 3439 

Query: 1437 aagttctgatagctgagaacacaatgctcacttctaaattgaaggaaaaacaagacaaag 1496 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 3440 aagttctgatagctgagaacacaatgctcacttctaaattgaaggaaaaacaagacaaag 3499 



Query: 1497 aaatactagaggcagaaattgaatcacaccatcctagactggcttctgctgtacaagacc 1556 

I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Sbjct: 3500 aaatactagaggcagaaattgaatcacaccatcctagactggcttctgctgtacaagacc 3559 



Query: 1557 atgatcaaattgtgacatcaagaaaaagtcaagaacctgctttccacattgcaggagatg 1616 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I II I I I I I I 
Sbjct: 3560 atgatcaaattgtgacatcaagaaaaagtcaagaacctgctttccacattgcaggagatg 3619 



Query: 1617 cttgtttgcaaagaaaaatgaatgttgatgtgagtagtacgatatataacaatgaggtgc 1676 

I I I M I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I M I I I I I II I I II M 

Sbjct: 3620 cttgtttgcaaagaaaaatgaatgttgatgtgagtagtacgatatataacaatgaggtgc 3679 

Query: 1677 tccatcaaccactttctgaagctcaaaggaaatccaaaagcctaaaaattaatctcaatt 1736 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I 

Sbjct: 3680 tccatcaaccactttctgaagctcaaaggaaatccaaaagcctaaaaattaatctcaatt 3739 

Query: 1737 atgcaggagatgctctaagagaaaatacattggtttcagaacatgcacaaagagaccaac 1796 

I I I II I I I I I I I II I I I I I M II I I I t I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I 
Sbjct: 3740 atgcaggagatgctctaagagaaaatacattggtttcagaacatgcacaaagagaccaac 3799 

Query: 1797 gtgaaacacagtgtcaaatgaaggaagctgaacacatgtatcaaaacgaacaagataatg 1856 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 3800 gtgaaacacagtgtcaaatgaaggaagctgaacacatgtatcaaaacgaacaagataatg 3859 

Query: 1857 tgaacaaacacactgaacagcaggagtctctagatcagaaattatttcaactacaaagca 1916 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 3860 tgaacaaacacactgaacagcaggagtctctagatcagaaattatttcaactacaaagca 3919 

Query: 1917 aaaatatgtggcttcaacagcaattagttcatgcacataagaaagctgacaacaaaagca 1976 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 3920 aaaatatgtggcttcaacagcaattagttcatgcacataagaaagctgacaacaaaagca 3979 

Query: 1977 agataacaattgatattcattttcttgagaggaaaatgcaacatcatctcctaaaagaga 2036 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 3980 agataacaattgatattcattttcttgagaggaaaatgcaacatcatctcctaaaagaga 4039 

Query: 2037 aaaatgaggagatatttaattacaataaccatttaaaaaaccgtatatatcaatatgaaa 2096 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 4040 aaaatgaggagatatttaattacaataaccatttaaaaaaccgtatatatcaatatgaaa 4099 

Query: 2097 aagagaaagcagaaacagaaaactcatgagagacaagcagtaagaaacttcttttggaga 2156 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Sbjct: 4100 aagagaaagcagaaacagaaaactcatgagagacaagcagtaagaaacttcttttggaga 4159 

Query: 2157 aacaacagaccagatctttactcacaactcatgctaggaggccagtcctagcatcacctt 2216 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I 
Sbjct: 4160 aacaacagaccagatctttactcacaactcatgctaggaggccagtcctagcatcacctt 4219 

Query: 2217 atgttgaaaatcttaccaatagtctgtgtcaacagaatacttattttagaagaaaaattc 2276 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 4220 atgttgaaaatcttaccaatagtctgtgtcaacagaatacttattttagaagaaaaattc 427 9 



Query: 2277 atgatttcttcctgaagcctacagacataaaataacagtgtgaagaattacttgttcacg 2336 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I II I I I I I I I I I I I I I I 
Sbjct: 4280 atgatttcttcctgaagcctacagacataaaataacagtgtgaagaattacttgttcacg 4339 



Query: 2337 aat 2339 
I I I 

Sbjct: 4340 aat 4342 



> gi| 134 69728 | gb | AF2 69087 . 1 |AF2 69087 Homo sapiens breast cancer antigen NY-BR-1 
mRNA, complete cds 

Length = 4458 

Score = 2266 bits (1143), Expect =0.0 
Identities = 1143/1143 (100%) 
Strand = Plus / Plus 

Query: 1197 aggtttctcacactcatgaaaatgaaaattatctcttacatgaaaattgcatgttgaaaa 1256 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I 11 I I I I I I 
Sbjct: 3197 aggtttctcacactcatgaaaatgaaaattatctcttacatgaaaattgcatgttgaaaa 3256 

Query: 1257 aggaaattgccatgctaaaactggaaatagccacactgaaacaccaataccaggaaaagg 1316 

I M I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Sbj ct : 3257 aggaaattgccatgctaaaactggaaatagccacactgaaacaccaataccaggaaaagg 3316 

Query: 1317 aaaataaatactttgaggacattaagattttaaaagaaaagaatgctgaacttcagatga 137 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 3317 aaaataaatactttgaggacattaagattttaaaagaaaagaatgctgaacttcagatga 3376 

Query: 1377 ccctaaaactgaaagaggaatcattaactaaaagggcatctcaatatagtgggcagctta 1436 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 3377 ccctaaaactgaaagaggaatcattaactaaaagggcatctcaatatagtgggcagctta 3436 

Query: 1437 aagttctgatagctgagaacacaatgctcacttctaaattgaaggaaaaacaagacaaag 1496 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II 
Sbjct: 3437 aagttctgatagctgagaacacaatgctcacttctaaattgaaggaaaaacaagacaaag 3496 

Query: 14 97 aaatactagaggcagaaattgaatcacaccatcctagactggcttctgctgtacaagacc 1556 

I I I M I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Sbjct: 34 97 aaatactagaggcagaaattgaatcacaccatcctagactggcttctgctgtacaagacc 3556 



Query: 1557 atgatcaaattgtgacatcaagaaaaagtcaagaacctgctttccacattgcaggagatg 1616 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I 
Sbjct: 3557 atgatcaaattgtgacatcaagaaaaagtcaagaacctgctttccacattgcaggagatg 3616 



Query: 



1617 



cttgtttgcaaagaaaaatgaatgttgatgtgagtagtacgatatataacaatgaggtgc 
II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 



1676 



J 



t 

Sbjct: 3617 cttgtttgcaaagaaaaatgaatgttgatgtgagtagtacgatatataacaatgaggtgc 3676 

Query: 1677 tccatcaaccactttctgaagctcaaaggaaatccaaaagcctaaaaattaatctcaatt 1736 

I I I I I I I I I I I M I I I I I M I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 3677 tccatcaaccactttctgaagctcaaaggaaatccaaaagcctaaaaattaatctcaatt 3736 

Query: 1737 atgcaggagatgctctaagagaaaatacattggtttcagaacatgcacaaagagaccaac 1796 

I I I I I I I I I I II I M II I I I I I I I I I I I I I I I I I M 1 I I I I I M M I I I I I I I II I I I I I 
Sbjct: 3737 atgcaggagatgctctaagagaaaatacattggtttcagaacatgcacaaagagaccaac 3796 

Query: 17 97 gtgaaacacagtgtcaaatgaaggaagctgaacacatgtatcaaaacgaacaagataatg 1856 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Sbjct: 3797 gtgaaacacagtgtcaaatgaaggaagctgaacacatgtatcaaaacgaacaagataatg 3856 

Query: 1857 tgaacaaacacactgaacagcaggagtctctagatcagaaattatttcaactacaaagca 1916 

II I M I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 3857 tgaacaaacacactgaacagcaggagtctctagatcagaaattatttcaactacaaagca 3916 

Query: 1917 aaaatatgtggcttcaacagcaattagttcatgcacataagaaagctgacaacaaaagca 1976 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Sbjct: 3917 aaaatatgtggcttcaacagcaattagttcatgcacataagaaagctgacaacaaaagca 3976 

Query: 1977 agataacaattgatattcattttcttgagaggaaaatgcaacatcatctcctaaaagaga 2036 

I I I I I I II I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I M I I I I I I M I I I I I I 
Sbjct: 3977 agataacaattgatattcattttcttgagaggaaaatgcaacatcatctcctaaaagaga 4036 

Query: 2037 aaaatgaggagatatttaattacaataaccatttaaaaaaccgtatatatcaatatgaaa 2096 

I I I II I I II II I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct: 4037 aaaatgaggagatatttaattacaataaccatttaaaaaaccgtatatatcaatatgaaa 4096 

Query: 2097 aagagaaagcagaaacagaaaactcatgagagacaagcagtaagaaacttcttttggaga 2156 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I II I I I I I I I I II I I I I 
Sbjct: 4097 aagagaaagcagaaacagaaaactcatgagagacaagcagtaagaaacttcttttggaga 4156 

Query: 2157 aacaacagaccagatctttactcacaactcatgctaggaggccagtcctagcatcacctt 2216 

I I II II I I I I I I I I I I I I I I I I II I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
Sbjct : 4157 aacaacagaccagatctttactcacaactcatgctaggaggccagtcctagcatcacctt 4216 

Query: 2217 atgttgaaaatcttaccaatagtctgtgtcaacagaatacttattttagaagaaaaattc 2276 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I M I I I I I I I I I I I I I I I I I I I I 
Sbjct: 4217 atgttgaaaatcttaccaatagtctgtgtcaacagaatacttattttagaagaaaaattc 4276 

Query: 2277 atgatttcttcctgaagcctacagacataaaataacagtgtgaagaattacttgttcacg 2336 

II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II 
Sbjct: 4277 atgatttcttcctgaagcctacagacataaaataacagtgtgaagaattacttgttcacg 4336 



Query: 2337 aat 2339 
I I I 

Sbjct: 4337 aat 4339 
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ABSTRACT 

Application of SEREX (serological analysis of recombinant tumor 
cDNA expression libraries) to different tumor types has led to the iden- 
tification of several categories of human tumor antigens. In this study, the 
analysis of a breast cancer library with autologous patient serum led to the 
isolation of seven genes, designated NY-BR-1 through NY-BR-7. NY-BR-1, 
representing 6 of 14 clones isolated, showed tissue-restricted mRNA ex- 
pression in breast and testis but not in 13 other normal tissues tested. 
Among tumor specimens, NY-BR-1 mRNA expression was found in 21 of 
25 breast cancers but in only 2 of 82 nonmammary tumors. Structural 
analysis of NY-BR-1 cDNA and the corresponding genomic sequences in 
the recently released working draft of human genome indicated that 
NY-BR-1 is composed of 37 exons and has an open reading frame of 
4.0-4.2 kb, encoding a peptide of M r 150,000-160,000. A bipartite nuclear 
localization signal motif indicates a nuclear site for NY-BR-1, and the 
presence of a bZIP site (DNA-binding site followed by leucine zipper 
motif) suggests that NY-BR-1 is a transcription factor. Additional struc- 
tural features include five tandem ankyrin repeats, implying a role for 
NY-BR-1 in protein-protein interactions. NY-BR-1 thus represents a 
breast tissue-specific putative transcription factor with autoimmunogenic- 
ity in breast cancer patients. In addition to NY-BR-1, a homologous gene, 
NY-BR-1. J, was identified in this study. NY-BR-1. 1 shares 54% amino acid 
homology with NY-BR-1 and also shows tissue-restricted mRNA expres- 
sion. However, unlike NY-BR-1, NY-BR-1. 1 mRNA is expressed in brain, 
in addition to breast and testis. The exon structure of NY-BR-1. 1 remains 
to be defined. Using human genome database, NY-BR-1 was localized to 
chromosome 10pll-pl2, and NY-BR-1. 1 was tentatively localized to chro- 
mosome 9. 

INTRODUCTION 

Whether immunological factors play a role in the development, 
growth, and progression of human breast cancer remains a critical 
unresolved issue. The lymphocyte infiltrates frequently associated 
with breast cancer (1-5), particularly the intense T- and B-cell infil- 
trates in medullary carcinoma (6-8), and the reactive changes in the 
draining lymph nodes of breast cancer patients (9, 1 0) are consistent 
with the idea of immune recognition in breast cancer. However, 
efforts to relate the lymphocyte infiltrate and lymph node changes 
with prognosis have not yielded conclusive evidence for such an 
association (11). The search for breast cancer antigens that elicit 
humoral or cellular immune reactions in breast cancer patients also 
has a long history, from evidence for immune responses against the 
murine mammary tumor virus (12) and delayed hypersensitivity and 
humoral immunity against T/Tn antigens (13), to more recent findings 
of antibody and T-cell responses to p53 (14) and HER-2/neu (15, 16). 

One major challenge confronting the analysis of autologous im- 
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mune responses in breast cancer, however, is the well-recognized 
difficulty of establishing breast cancer cell lines as targets for immu- 
nological analysis. This is in contrast to the relative ease of establish- 
ing lines from melanoma, renal cancer, and other tumor types. For this 
reason, the analysis of the human T-cell response against melanoma 
and the molecular identification of the antigens eliciting these re- 
sponses are far more advanced in melanoma (17-19) than in breast 
cancer. 

The recent development of SEREX, 3 a general method to analyze 
the humoral immune response of cancer patients that does not require 
autologous tumor cell lines, provides a powerful new way to dissect 
the immune response to breast cancer. Our initial application of 
SEREX to breast cancer led to the identification of p33TNGl , encoded 
by a putative tumor suppressor gene in breast cancer, as an immuno- 
genic breast cancer antigen. In addition, CT antigens, shown previ- 
ously to be immunogenic antigens in other tumor types, were identi- 
fied (20). In the present study, we have continued our effort to define 
breast cancer antigens by SEREX. Of the panel of antigens identified, 
a highly restricted breast autoimmunogenic differentiation antigen, 
NY-BR-1, was identified and characterized. 

MATERIALS AND METHODS 

Tumor Tissue and Cell Lines. The BR 17 tumor sample was derived from 
a s.c. metastasis of a 60-year-old female patient at Krankenhaus Nordwest. The 
patient had an unusually favorable history with metastatic ductal carcinoma of 
the breast. Breast cancer cell lines and cell lines of other tumor types were 
obtained from the repository maintained at the Ludwig Institute for Cancer 
Research, New York Branch at the Memorial Sloan-Kettering Cancer Center. 
Tumor tissues were obtained from the Departments of Pathology at The New 
York Presbyterian Hospital and the Memorial Sloan-Kettering Cancer Center. 

RNA Extraction and Construction of cDNA Expression Library. Total 
RNA was extracted from the BR 17 breast cancer sample by conventional 
CsCl-guanidine thiocyanate gradient method. A cDNA library was constructed 
in a A-ZAP Express vector, using a commercial cDNA library kit (Stratagene). 

Immunoscreening of the cDNA Library. The unamplified cDNA expres- 
sion library was screened with the autologous serum at 1:200 dilution. The 
screening procedure was as described previously (2 1 ). Briefly, the serum was 
diluted 1:10, preabsorbed with phage-transfected Escherichia coli lysate, fur- 
ther diluted to 1:200, and incubated overnight at room temperature with the 
nitrocellulose membranes (Schleicher & Schuell) containing the phage plaques 
at a density of 4000-5000 pru/130-mm plate. After washing, the filters were 
incubated with alkaline phosphatase-conjugated goat antihuman Fey second- 
ary antibodies, and the reactive phage plaques were visualized by incubating 
with 5-bromo-4-chloro-3-indolyl-phosphate and nitroblue tetrazolium. 

Sequence Analysis of the Reactive Clones. The reactive clones were 
subcloned, purified, and in vivo excised to pBK-CMV plasmid forms (Strat- 
agene). Plasmid DNA was prepared by using the Wizard Miniprep DNA 
Purification System (Promega). The inserted DNA was evaluated by EcoRl- 
Xba\ restriction mapping, and clones representing different cDNA inserts were 
sequenced. The sequencing reactions were performed by the DNA Sequencing 



'The abbreviations used are: SEREX, serological analysis of recombinant tumor 
cDNA expression libraries; CT, cancer testis; EST, expressed sequence tag; RT-PCR, 
reverse transcription-PCR; RACE, rapid amplification of cDNA ends; ORF, open reading 
frame; CD, cluster of differentiation; pfu, plaque- forming unit. 
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Service at Cornell University (Ithaca, NY) using Applied Biosystems PRISM 
(Perkin- Elmer) automated sequencers. DNA and amino acid sequences were 
compared with sequences in the GenBank and the EST databases using the 
BLAST program. Genes identical to entries in the GenBank were classified as 
known genes, whereas those that shared sequence identity only to ESTs and 
those which have no identity in either GenBank or EST databases were 
designated as unknown genes. 

RT-PCR. To evaluate the mRNA expression pattern of the cloned cDNA 
in normal and malignant tissues, total RNA was extracted from breast cancer 
cell lines and tumor specimens by the conventional CsCl-guanidine thiocya- 
nate gradient method, and normal tissue RNA was obtained commercially 
(Clontech). Gene-specific oligonucleotide primers were designed to amplify 
cDNA segments of 300-600 bp in length, with the estimated primer melting 
temperature in the range of 65-70°C (see Figs. 2 and 4 for specific primer 
sequences). All primers were synthesized commercially (Operon Technolo- 
gies, Alameda, CA). RT-PCR was performed using 30 amplification cycles in 
a thermal cycler (Perkin-Elmer) at an annealing temperature of 60°C, and the 
products were analyzed by 1.5% gel electrophoresis and ethidium bromide 
visualization. 

Rapid Amplification of cDNA Ends. RACE reactions (5 '-RACE and 
3 '-RACE) were performed using gene-specific and adaptor-specific primers in 
conjunction with Marathon-Ready normal testis cDNA and AmpliTaq Gold 
polymerase (Perkin-Elmer). Products were ligated into the PCR-direct cloning 
vector pGEMT plasmid and analyzed by restriction mapping and sequencing. 

Hybridization Screening of a Testicular Library. A commercially ob- 
tained testis cDNA expression library (Stratagene) was screened using a 
NY-BR-1 PCR product as a probe (see Fig. 2 for primer sequences), as 
described in the Stratagene manual. Briefly, a total of 5 X 10 4 pfu/150-mm 
plate were transferred to nitrocellulose membranes (Schleicher & Schuell), the 
membranes were submerged in denaturation solution (1.5 M NaCl and 0.5 M 
NaOH) for 5 min, transferred into neutralization solution (1.5 M NaCl and 0.5 
M Tris-HCl) for 5 min, and then rinsed in 0.2 M Tris-HCl and 2X SSC The 
membranes were hybridized to a 32 P-labeled DNA probe at high stringency 
condition (68°C, aqueous buffer) and washed at high stringency condition. 
Positive clones were subcloned, purified, and in vivo excised to pBK-CMV 
plasmid forms as described above. 

RESULTS 

A total of 7 X 10 5 pfu from the BR17 cDNA library were screened 
using autologous BR 17 serum at 1:200 dilution. Fourteen reactive 
clones were purified and sequenced. Comparison to GenBank and 
EST database revealed that these 14 clones were derived from seven 
distinct genes, two known and five unknown. These genes, designated 
NY-BR-1 through NY-BR-7, are described in Table 1. Four clones 
were derived from the two known genes, PBK-J (BR17-76, BR 17- 
1 18, and BR17-137) and TI-227 (BR17-100). PBK-1 and TI-227 are 
universally expressed genes, because ESTs derived from these two 
genes have been reported in many different normal tissues. Of the 



Table 1 Clones identified by autologous SEREX screening of BR 17 cDNA library 



Designation Clone GenBank 



Expression profile 



NY-BR-1 



NY-BR-2 



NY-BR-3 

NY-BR-4 
NY-BR-5 
NY-BR-6 

NY-BR-7 



BR17-128 Unknown 



BR17-la 

BR 17-8 

BR17-35b 

BR17-44a 

BRI7-44b 

BRI7-76 

BRI7-I18 

BR17-137 

BR17-100 



BRI7-91b 
BR17-1I5 
BR17-117 



PBK-1 



TI-227 

Unknown 
Unknown 
Unknown 



BR 17- 144 Unknown 



Expressed in normal breast and testis only 
(RT-PCR) 



EST: ubiquitous 



EST: muscle, colon, endothelium, 
pancreas 

EST: retina, cortex, fetal liver, spleen 
EST: uterus, melanocytes, fetal heart 
EST: tonsills, uterus, melanocytes, fetal 
heart 

EST: uterus, melanocytes, fetal heart 



remaining clones, NY-BR-4 through NY-BR-7, represented by one 
clone each, were also universally expressed based on comparison to 
EST databank entries. The six remaining clones, BR 17- la, BR 17-8, 
BR17-35b, BR17-44a, BR17-44b, and BR17-128, were derived from 
the same unknown gene, NY-BR-1. Three matching cDNA sequences 
for NY-BR-1 were found in the EST database, two derived from 
breast cancer (accession numbers AI951 1 18 and AW373574), and the 
third (accession number AW1 70035) derived from a pooled tissue 
source (testis, fetal lung, and B cell), suggesting a possible tissue- 
restricted expression of NY-BR-1 mRNA (see below). 

Structural Analysis of NY-BR-1 cDNA. Compilation of the six 
NY-BR-1 cDNA clones revealed a cDNA sequence of 1464 bp. 
Analysis showed a continuous ORF throughout this sequence, indi- 
cating that this is a partial cDNA sequence, truncated at both 5' and 
3' ends. Comparison with the EST entry AW1 70035 (446 bp) re- 
vealed 100% sequence identity in the 89 bp overlapping the 5' 
sequence, with the EST entry extending 357 bp further in its 3' 
sequence than NY-BR-1 cDNA clones. Sequences of the other two 
EST entries (AI951118 and AW373574) are contained within NY- 
BR-1. Combining the EST sequence with the cloned NY-BR-1 se- 
quence allowed the definition of the translational termination codon, 
with a 3' untranslated region of 333 bp. 

To complete the missing 5' cDNA sequence, a testicular library was 
screened using a NY-BR-1 PCR product as a probe. One of the clones 
isolated during this screening extended the 5' sequence of NY-BR-1 
1346 bp but did not provide a definite translation initiation site. On the 
basis of this cDNA sequence, a 5' RACE-PCR was performed, and 
the PCR product was cloned into the pGEMT plasmid vector and 
sequenced. This 5 '-RACE sequence extended the cDNA sequence 
1292 bp further 5', with the longest ORF starting at the ATG codon 
at position 100. No stop codon was found in the 99-bp 5' sequence, 
suggesting the possibility of additional 5' coding sequence in NY- 
BR-1. However, repeated 5 '-RACE using different nested-primer 
pairs and adaptor-Iigated cDNA derived from different NY-BR-1 
mRNA-positive tissues (testis and breast, see below) failed to extend 
the 5' cDNA sequence further. 

The available NY-BR-1 cDNA has a 4125-bp coding sequence and 
a 333-bp 3 '-untranslated segment (submitted to GenBank, accession 
number AF269087). The predicted amino acid sequence from the 
possible ATG initiation codon (nucleotide position 100) is shown in 
Fig. 1. Motif analysis of the amino acid sequence using PROSITE and 
Pfam search programs identified a bipartite nuclear localization signal 
motif at amino acid position 17-34, suggesting that NY-BR-1 is a 
nuclear protein. Five tandem ankyrin repeats were also identified, 
located at amino acid positions 49-81, 82-114, 115-147, 148-180, 
and 181-213. The presence of a bZIP site (DNA-binding site followed 
by leucine zipper motif) at amino acid position 1077-1104 suggests 
that this nuclear protein functions as a transcription factor. Of interest, 
three additional repetitive elements were identified located between 
the ankyrin repeats and the NH 2 -terminal bZIP DNA-binding site 
(Fig. 1). The first repetitive element, consisting of 357 nucleotides 
(119 amino acids), is tandemly repeated three times, spanning amino 
acid residues 459-815. The second repetitive sequence, consisting of 
repeats of 1 1 amino acids, is located between amino acids 224 and 300 
(seven repeats). The third repetitive sequence, consisting of only two 
repeats of 34 amino acids each, is located between amino acids 
301-334 (Fig. 1). 

mRNA Expression of NY-BR-1. NY-BR-1 mRNA expression 
was tested in a panel of 15 different normal tissues (adrenal gland, 
fetal brain, lung, mammary gland, pancreas, placenta, prostate, thy- 
mus, uterus, ovary, brain, kidney, liver, colon, and testis). RT-PCR 
analysis showed a strong signal in mammary gland and testis and a 
very faint signal in placenta. All other tissues were negative (Fig. 2A\ 
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Fig. 1. Predicted amino acid sequence of NY-BR-1. A bipartite nuclear localization 
signal motif is highlighted at amino acid positions 17-34. Five tandem ankyrin repeats arc 
located at positions 49-81, 82-114, 115-147, 148-180, and 181-213. A bZIP (DNA- 
binding site followed by a leucine zipper motif) is located at position 1077-1 104. The 
peptide segment present only in one of the two splice variants, positions 973-1009, is 
underlined. Three additional repetitive elements were identified (amino acids 459-815, 
224-300, and 301-334; sec text). 



NY-BR-1 expression in breast cancers and other tumors was ex- 
amined. Twenty-five breast cancer samples were tested, and 21 of 
them (84%) were positive by RT-PCR (17 showed strong signals, and 
the other 4 samples showed weak to moderate signals; part of the data 
are shown in Fig. 2B). Among 82 nonmammary tumor samples tested 
(36 melanomas, 26 non-small cell lung cancers, 6 colon cancers, 6 
squamous cell carcinomas, 6 transitional cell carcinomas, and 2 
leiomyosarcomas), only 2 melanomas showed NY-BR-1 expression. 

The expression of NY-BR-1 in tissue culture lines was also exam- 



ined in cell lines derived from breast tumor, melanoma, and small cell 
lung cancer. Four of six breast cancer cell lines (two showed very 
weak signals), four of eight melanoma lines (2 very weak), and 7 of 
14 small cell lung cancer lines (2 very weak) were positive (data not 
shown). 

Chromosomal Localization and Exon-Intron Organization of 
NY-BR-1. Comparison of the NY-BR-1 sequences with the newly 
available working draft version of the human genome allowed the 
assignment of NY-BR-1 to chromosome lOpl 1.21-12.1, with at 
least three chromosome 10 clones showing sequence identity to 
NY-BR-1 (GenBank accession numbers AL1 57387, AL357148, and 
AC067744). 

A comparison of NY-BR-1 cDNA and genomic sequences also 
permitted the definition of NY-BR-1 exon-intron organization. The 
amino acid coding region of this gene contains a basic framework 
of 19 structurally distinct exons, with at least two additional exons 
encoding 3 '-untranslated sequence. The detailed exon-intron junc- 
tion information is described in the GenBank entry (accession 
number AF269087). The six ankyrin repeats are encoded by exons 
2-6. Of great interest was the finding that the 357-nucleotide 
repeating unit in NY-BR-1 cDNA is composed of six exons, exons 
10-15. The available genomic sequences are incomplete in this 
region, and only one of the three copies of the 357-bp repeats in 
NY-BR-1 cDNA was identified. This finding suggests that the 
DNA segment between exons 10 and 15 were duplicated and 
inserted in tandem during evolution. In the isolated NY-BR-1 
cDNA clones, three complete copies and one incomplete copy of 
such repeating units are present. Thus, the predicted exon se- 
quences in NY-BR-1 can be expressed as exons 1-15-(10A-15A)- 
(10B-15B)-(10C-13C)-16-21, in which A, B, and C are inexact 
copies of the exon 10-15 sequences. The NY-BR-1 cDNA, there- 
fore, is derived from a total of 37 exons; whether there are allelic 
differences in the copy number of this repetitive element (and thus 
the number of exons) in different individuals is currently unknown. 

The available genomic sequence (GenBank AC067744) also al- 
lowed us to extend the 5' sequence of this gene beyond the cloned 
NY-BR-1 cDNA. Translation of the 5' genomic sequence using the 
previously assigned NY-BR-1 ORF led to the identification of a new 
translation initiation site 168 bp upstream to the previously predicted 
ATG initiation codon in NY-BR-1 cDNA (see text above and Fig. 1). 



Fig. 2. NY-BR-1 mRNA expression by RT-PCR. Primers used were: 
5' primer BR1A, 5 '-CAAAGC AG AGCCTCCCG AG A AG-3 ' ; and 3' 
primer BR IB, and 5 ' -CCTATGCTGCTCTTCG ATTCTTCC-3 ' . Of 15 
normal tissues, RT-PCR showed strong signals in mammary gland and 
in testis, a very faint signal in placenta, and was negative in other tissues 
(a). Of 19 breast cancer specimens shown, 15 were positive, including 
three weak positives (breast tumors 3, 4, and 5; b). 
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NY-BR-1 MTKRKKTINl^IQDAQKRTALHWACVNGHEEVVTFLVDRK^ IOC 

NY-BR-1. 1 M. . 3 

NY-BR-1 i^LSHGAVIEVHNKASLTPLLLSITKRSEQIVEFMjIKNANA^ 200 

NY-BR-1. 1 T...Y Q A.Q. . .K.7 T P.ES I.E E. .H.I. - .R. .AAR.VNY. . Q . 103 

NY-BR-1 IMEYIRKLSKNHQNTNPEGTSAGTPDEAAPLAERTPOTAESL^ 3 00 

NY-BR-1. 1 LL.H P..P T L R A G T. . 181 

NY-BR-1 TPREITSPAKETSEKFTWPAKGRPRKIAWEKKEDTPR EIMSPAK-ETSEKFTWAAKGRPRKIAWEKKETPVKTGCVARV 378 

NY-BR-1. 1 . . . K.LR.T S E.S. . .T. .E. . TSVKTECV AGVT . N . T . VL . .G . SNMIAC .T. ETST. AS -N. DVSS . EPIFSLFGTRTIEKSQCTKV 281 

NY -BR- 1 - - -TSNKTKVLEKGRSKMIAC PTKES STKASANDQRFPSESKQEEDEEYSCDSRSLFES SAKIQVC I PES IYQKVMEINREVEEPPKKPSAFKP 469 

NY-BR-1. 1 EEDFNLA . .IIS. SAAQNYT . LPDATYQKDIKTINHKIE . . M R W. .G T M L.E 381 

NY-BR-1 AIEMQNSVPNKAFEIJCNEQTLJUVDPMFPPESKQKDYEEWSVroSES 569 

NY-BR-1. 1 .V AQ...S D Y F.TLS V P. . .R. . .L.N 481 

NY-BR-1 KDMOTFKAEPPGKPSAFEPATEMQKSVPNKALELKNEOTWRA DEIL-PSESKQKDYEENSWDTESLCETVSQKDVCLPKAAHQKEIDKINGKLEG 663 

NY-BR-1. 1 . .RE S . D . DGLLK . TCGRKV . L D .R.LX.ESPDN.GL.K.TCGRKVSLPNKALELKDR- . . FKAAQM- F . SESK . .DDEENSWDF. - 578 

NY-BR-1 SFVKDGLLKAIXXSMKVSIFTKALELMEM^ 763 

NY-BR-1. 1 -SFLET. . QNDVCLPKATHQ . EFDTLS-GKLE- . S . D . DGLLK . TCG . KI . L- DER . FK . EDVSSV. . TFSLFGKPT- -T. NSQS . KVEE . FN 673 

NY-BR-1 LPKATHQKEIDKINGKLEESPDNIXSFIJCAPCRMKVSIP^ 863 

NY-BR-1.1 . — T.KEGATKTVT. — QQER.IGIZER. . QDQTNKM. .SE.G RK . DTKS-TSDSEI ISVSDTQNY . CLP . A . Y . K-EIKTTNG . 1 . 754 

NY-BR-1 ENSWDSESLRETVSQKDVCVPKATHQKEMDKISGKI^ 963 

NY-BR-1. 1 . SP-EKP.KF. PATEMQNS . . NKGLEWK-N.QTLR-A. . -T ALP G. . .K. .N. . .I.A N Q.S A. 851 

NY-BR-1 WEQELCSWLTIiNQEEEKRRNADIIjNEKIREEIjGRIEEQHRKE 1063 

NY-BR-1. 1 P V...K....P. ..L..K....H T T S. .D.F V. 946 
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Fig. 3. Comparison of the predicted amino acid sequences of NY-BR-1 and NY-BR-I.l. Identical sequences arc shown as dots (***•), and gaps are shown as dashes ( — ). 



If this newly identified ATG is the true initiation site used in vivo,, the 
NY-BR-1 polypeptide would contain 1397 amino acids, 56 residues 
longer than is depicted in Fig. 1 (additional NH 2 terminus sequence: 
MEEISAAAVKVVPGPERPSPFSQLVYTSNDSYIVHSGDLRKIH- 
KAASRGQVRKLEK). 

Identification of NY-BR-1 Splice Variants. Sequence compari- 
son of the six SEREX-defined NY-BR-1 clones revealed that they 
were derived from two different splice variants. One variant contains 
an additional coding sequence of 1 1 1 bp (nucleotide nos. 3015-3125 
of cloned NY-BR-1 cDNA, encoding amino acids 973-1009; see Fig. 
1), which is absent in the other variant. Comparison with the genomic 
sequence confirmed that this results from an alternate splicing event, 
with the longer variant incorporating part of the intron 33 into exon 34 
(i.e., exon 17 of the basic exon-intron framework described above). 
Key structural elements predicted above in the NY-BR-1 sequence are 
present in both splice variants, suggesting no apparent difference in 
biological function or subcellular localization. 

The expression of these two splice variants was evaluated using 
primers specific to the larger variant, as well as primers spanning the 
alternatively spliced exon. In the normal tissues analyzed, both vari- 
ants showed strong expression in testis and breast by RT-PCR (but not 
in other tissues), with the larger variant being the dominant form in 
testis and the shorter variant dominant in breast. A selective group of 
10 breast cancer samples were typed for these two splice forms, and 
results showed cotyping of the two variants (7 strong positive, 2 
weaker positive, and 1 negative), with the shorter variant consistently 
being the predominant form. 

Isolation of a NY-BR-1 Homologue Gene. Screening testicular 
cDNA library with a NY-BR-1 probe identified a cDNA encoding a 
new gene with homology to the NY-BR-1 gene. This clone, 3673 bp 
excluding the poly(A) tail, corresponded to nucleotides 1-3481 of the 
NY-BR-1 and showed 62% homology. A DNA database search re- 
vealed no sequence identity to GenBank "nr" database, and the new 
gene has been designated NY-BR-1. J (submitted to GenBank, acces- 
sion number AF269088). ORF analysis showed an ORF from nucle- 



otide 641 to the end of the cloned sequence, with 54% homology to 
the putative NY-BR-1 protein sequence. The ATG initiation codon of 
NY-BR-1.1 is preceded by a 640-bp 5 '-untranslated region with 
scattered stop codons. Comparison of the available NY-BR-1 and 
NY-BR-1.1 amino acid sequences is shown in Fig. 3. RT-PCR anal- 
ysis for NY-BR-1.1 showed a tissue-restricted mRNA expression 
pattern distinct from NY-BR-1. Among six normal tissues examined, 
NY-BR-1.1 showed strong RT-PCR signal in testis, moderate signals 
in brain and breast tissues, and was negative in kidney, liver, and 
colon (Fig. 4A). Upon multiple repeated experiments, normal breast 
tissues showed either no or weak signals, consistently weaker than 
those observed in testis and often in brain, indicating a lower level of 
expression. NY-BR-1.1 expression was also examined in six breast 
cancer cell lines and 10 breast cancer specimens. One of six breast 
cancer cell lines was positive, in contrast to four of six for NY-BR-1. 
Eight of 10 breast cancer specimens were positive. In comparison with 
NY-BR-1 expression, 6 were positive for both, 1 was positive for 
NY-BR-1 only, 2 were positive for NY-BR-1.1 only, and 1 was 
negative for both (Fig. 4B). The strong expression in brain and 
low-level expression in normal breast and the lack of correlation in 
NY-BR-1 and NY-BR-1.1 expression in breast cancer lines and tis- 
sues indicate that these two gene products have a clearly distinct 
expression pattern. 

Genomic Sequence of NY-BR-1.1. Comparison of the NY-BR- 
1 . 1 sequence with the released working draft of the human genome 
revealed one clone with sequence identity (GenBank AL359312). 
This clone was presumably derived from chromosome 9, indicating 
that NY-BR-1 and NY-BR-1.1 reside on two different chromosomes. 
The AL359312 genomic sequence does not contain the entire NY- 
BR-1.1 cDNA sequence, precluding the definition of NY-BR-1.1 
exon-intron structure. However, at least three exons can be defined, 
which are the counterparts of the basic framework of exons 16, 17, 
and 18 in NY-BR-1. The exon-intron junctions of NY-BR-1 and 
NY-BR-1.1 are conserved in these exons. 
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Fig. 4. NY-BR-1.1 mRNA expression by RT-PCR. Primers 
used were: 5' primer BR-UA, 5 ' -TCTC ATAG ATGCTGGTGCT- 
GATC-3'; and 3' primer BR I. IB, and 5'-CCCAGACATT- 
G AATTTTGGC AGAC-3 ' . Of six normal tissues tested, RT-PCR 
showed a strong signal in testis, moderate signals in brain and 
mammary gland, and negative in kidney, liver, and colon (a). Of 10 
breast cancer specimens, 8 were positive for NY-BR-1 .1 (b). Com- 
paring the NY-BR-1 and NY-BR-1.1 expression, 7 of 10 cotyped 
(6 positive and 1 negative), whereas two were positive for NY- 
BR-1. 1 only (tumors 7 and 9), and one was positive for NY-BR-1 
only (tumor 5). 
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DISCUSSION 

The SEREX approach has proved to be a very powerful tool to 
identify tumor antigens (20-30). SEREX-defined antigens have been 
classified into several categories, including differentiation antigens, 
CT antigens, mutational antigens, amplified/overexpressed antigens, 
splice variant antigens, and retroviral antigens (31). The expression of 
NY-BR-1 in breast tissue and testis but not in other normal tissues 
indicates that NY-BR-1 belongs to the category of differentiation 
antigens. 

Differentiation antigens are antigens that show expression in spe- 
cific cell lineage(s) or at specific stages of differentiation in a partic- 
ular cell Iineage(s) (32). This category of antigens has been best 
studied in cells of lymphoid and hematopoietic derivation, starting 
with the definition of mouse cell surface differentiation antigens of 
lymphocytes, such as TL (33, 34), Thy-1 (35), and Lyt-2 (36). 
Application of hybridoma technology to the analysis of human cells 
resulted in the identification of a broad range of differentiation anti- 
gens, and this has led to the classification system for CD antigens 
(37). Most of the initial CD antigens were restricted differentiation 
antigens expressed in lymphocytes and other hematopoietic cells, e.g., 
CD1 through CD8 primarily restricted to T cells (38). The expression 
of differentiation antigens in normal cells is generally preserved in 
their neoplastic counterparts, and this feature has made these antigens 
useful markers in the immunopathological differential diagnosis of 
cancer. The best example of this is again in the hematopoietic/ 
lymphocytic lineages, in which the antigenic profile of the neoplastic 
cells provided the foundation for the classification of leukemia/lym- 
phoma (39). In addition, these antigens can be targets for specific 
immunotherapy, and anti-CD20 antibody, recognizing a B-cell differ- 
entiation antigen, represents the first monoclonal antibody approved 
by the Food and Drug Administration for cancer immunotherapy (40). 

In addition to cells of hematopoietic origin, the melanocyte, a 
specialized cell type in the neuroectodermal lineage, has been found 
to express a number of well-characterized differentiation antigens, 
most of them associated with the melanin-synthesis pathway. Studies 
using polyclonal and monoclonal antibodies initially defined tyrosin- 
ase (41), gp75 (42), and gplOO (43). Recent efforts to identify mela- 
noma antigens recognized by CD8+ and CD4+ T cells also identified 
these antigens as T-cell targets and further expanded the list of 
melanocyte differentiation antigens (44-48). Melan-A/MART-1, 



identified by transfection-based T-cell epitope cloning as a CD8 + 
T-cell target (48, 49), and Rab38 (50), identified by SEREX analysis 
of melanoma, are two further examples of melanocyte differentiation 
antigens identified through these efforts. Similar to their hematopoi- 
etic counterparts, the melanocyte differentiation antigens have also 
been found useful in the clinical arena. Antibodies against gplOO and 
Melan-A/MART-1 are widely used to distinguish metastatic melano- 
mas from other metastatic malignancies (51), and melanoma vaccine 
trials targeting these antigens are being actively pursued (see Ref. 52 
for an example). 

With regard to common epithelial tissue, a wide range of gene 
products with differential expression have been defined, e.g., cyto- 
keratins (53, 54), muc in-related antigens (55), and hormonal receptors 
(56). However, with rare exceptions, none of these are exclusively 
expressed in only a single epithelial cell type, such as breast epithe- 
lium. In this regard, NY-BR-1 is of considerable interest because of its 
highly restricted expression pattern in normal tissue, i.e., breast and 
testis. The production of antibody probes for NY-BR- 1 is essential to 
confirm breast specificity at the protein and cell levels. With regard to 
cancer vaccine development, the restricted expression of NY-BR-1 in 
normal breast and breast cancer makes it a highly attractive vaccine 
target. However, the presence of a homologous gene, NY-BR-1.1, that 
is expressed in brain is cause for concern, and it will be necessary to 
show that T-cell reactivity to NY-BR-1 can be generated without 
cross-recognition of NY-BR-1 . 1 . 

The predicted protein sequence of NY-BR-1 contains a DNA- 
binding site and a leucine zipper motif (bZIP). The bZIP motif 
characterizes the superfamily of eucaryotic DNA-binding transcrip- 
tion factors that contain a basic region mediating sequence-specific 
DNA-binding, followed by a leucine zipper required for dimerization 
(57, 58). It is thus most likely that NY-BR-1 is a transcription factor. 
Five ankyrin repeats are also present in the NY-BR-1 protein. Ankyrin 
repeat proteins carry out a wide variety of biological activities and are 
found in both cytoplasm and nucleus. The repeat motif has been 
recognized in >400 proteins, including cyclin-dependent kinase in- 
hibitors, transcriptional regulators, cytoskeletal organizers, develop- 
mental regulators, and toxins (59). Thus, the ankyrin repeat in itself is 
not predictive of a specific cellular function or subcellular localiza- 
tion; rather, ankyrin repeats are thought to mediate a wide range of 
protein-protein interactions (59). In comparison to other ankyrin 



2059 



NY-BR-1 AS A TUMOR ANTIGEN IN BREAST CANCER 



repeat-containing proteins, NY-BR-1 is unique because of the other 
repetitive elements in its predicted protein sequence. These additional 
repetitive elements are not found in other sequences in the public 
protein databases, and their functional significance remains to be 
determined. 

By comparing the cDNA sequence with the recently released work- 
ing draft of the human genome, we were able to derive the following 
important information about NY-BR-1: (a) confirming the cDNA 
sequences of NY-BR-1 and NY-BR-1 .1; (b) mapping to chromosome 
1 Op 1 1-12 and NY-BR-1.1 tentatively to chromosome 9; (c) definition 
of the exon-intron structure of NY-BR-1, a complex gene with 37 
exons, and correlate exon structure to repeating peptide units; and (d) 
completion of the NH 2 terminus amino acid sequence of NY-BR-1 
that had defied our multiple cloning efforts and RACE analysis. On 
the other hand, the cDNA sequences of NY-BR-1 and NY-BR-1.1 
genes from this study will certainly help the annotation of correspond- 
ing genome sequences. The current study thus provides a clear ex- 
ample of valuable data that can be achieved by interaction between the 
human genome project and other scientific fields. 
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Exhibit C ■ 

FUNCTION 

BestFit makes an optimal alignment of the.best segment of similarity between two sequences. Optimal 
ahgnments are found by mserting gaps to maximize the number of matches using the local homology 
algorithm of Smith and Waterman. 

DESCRIPTION 

BestFit inserts gaps to obtain the optimal alignment of the best region of similarity between two 
sequences, and then displays the alignment in a format similar to the output from Gap. The sequences 
can be of very different lengths and have only a small segment of similarity between them. You could 
take a short RNA sequence, for example, and run it against a whole mitochondrial genome. 

SEARCHING FOR SIMILARITY 

BestFit is the most powerful method in the Wisconsin Sequence Analysis Package™ for identifying the 
best region of similarity between two sequences whose relationship is unknown. 

EXAMPLE 

The sequence gamma.seq contains an Alu family sequence somewhere in the first 500 bases, alu seq 
contains a generic human Alu family repeat The two sequences are aligned and the best segment of 
similarity is found with BestFit. 

% bestfit 

BESTFIT of what sequence 1 ? gannna.seq j 

Begin (* 1 *) ? 
End (* 11375 *) ? 500 
Reverse (* No *) ? 

to what sequence 2 (* gamma.seq *) ? alu. seq 

Begin (* 1 *) ? 
End (* 207 *) ? 
Reverse (* No *) ? 

What is the gap creation penalty (* 5.00 *) ? 

What is the gap extension penalty c 0.30 *) ? 

What should Z call zhe paired output display file (* gamma. pair *) 
Aligning . . . 



. Gaps: 3 

Q'-ali-v: 12?. 5 

Qualify ?.a-i:: 0.525 

% :™la:^-.v: 34.456 



% 



% 
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Comparison 

- OUTPUT ....... 

Here is the output file. Notice how. BestFit finds and displays only the best segments of similarity: 

BESTFIT of: gamma. seq check: 6474 from: 1 to: 500 

Human fetal beta globins G and A gamma 

from Shen, Slightom and Smithies, Cell 26; 191-203 

Analyzed by Smithies et al. Cell 26; 345-353. 

to: alu.seq check: 4238 from: 1 to: 207 

HSREP2 from the EMBL data library 

Human Alu repetitive sequence located near the insulin gene 
Dhruva DR., Shenk X. , Subramanian K.N.; "integration in vivo into 
o^L V -T "V* 3 Se<IUenCe that rese **>^ a certain family of 

! ePe ? ted Se<IUenCeS " ; «*• Acad. Sci.V 

SS^ST tablS: ^ COre ^ Sk: f Gc ^--^a. R undata ] Sw g a P dna.Cmp 

Gap Weight: 5.000 Average Match: 1.000 

Length Weight: 0.300 Average Mismatch: -0.900 

Quality: 129.3 Length: 209 . 

Ratio: 0.625 Gaps: 3 

Percent Similarity: 84.466 Percent Identity: .84.466 

gamma. seq x alu.seq June 20, 1994 15:15 

137 AGACCAACCTGGCCAAC^TGGTGAAATCCCATCTCTAC.aWtaC^AA 185 

imiii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .iiiiiiiiii " 

1 AGACCAGCCTGGCCAACATGGTGAAACTCCATCTCTACTGAAAATACAAA 50 

186 AATTAGACAGGCATGAXGGCAAGTGCCTGTAArCCCAGCTACTTGGGAGG 235 

IIIIIIJIMIIII II lllllll I 1 I | f I L i f I I 1 I | Mill 

51 AATTAGCCAGGCATGGTGA^G^GrGCCTGS&ATCCCAGCTACTTA^SGAGG 100 

236 CTGAGGAAGGAGAATTGCTrGAACCTGGAAGGCAGGAGTTGCAGTGAGCC 285 

1 1 1 1 ! .. I i J I I I l_J I I I I II I | | | | hi | | | | | 

101 CTaaacMaMi^^ 149 

286 GAGATCATACCACTGCACTCCAGCCTGGGrGACAGAACAAGAGTCTGTCr 335 

I • I i ^ I i. -J I I I I M I I I I I | | J | | | | | | | | j. ,,,,,, • , , , 

150 GAGATCGCACGGCTGCACTCCAGCCT '. GGTGACAGAGCGAGACTCCATCr 198 
336 CAAAAAAAA 344 
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-Comparison BestRt 5 . 21 
RELATED PROGRAMS 

When you want an alignment that covers the whole length of both sequences, use Gap. When you are 
trying to find only the best segment of similarity between two sequences, use BestFit. PileUp creates a 
multiple sequence alignment of a group of related sequences, aligning the whole length of all sequences. 
DotPlot displays the entire surface of comparison for a comparison of two sequences. GapShow 
displays the pattern of differences between two aligned sequences. PlotSimilarity plots the average 
similarity of two or more aligned sequences at each position in the alignment. Pretty displays 
alignments of several sequences. IineUp is an editor for editing multiple sequence alignments. 
CompTable helps generate scoring.matrices for peptide comparison. 

ALGORITHM 

BestFit uses theUM homology algorithm of Smith and Waterman (Advances in Applied 
Mathematics 2; 482-*S£ (1981)) to find the best segment of similarity between two sequences. BestFit 
reads a scoring matrix that contains values for every possible GCG symbol match (see the LOCAL 
DATA FILES topic below). The program uses these values to construct a path matrix that represents 
the entire surface of comparison with a score at every position for the best possible alignment to that 
point. The quality score for the best alignment to any point is equal to the sum of the scoring matrix 
values of the matches in that alignment, less the gap creation penalty times the number of gaps in that 
alignment, less the gap extension penalty times the total length of all gaps in that alignment The gap 
creation and gap extension penalties are set by you. If the best path to any point has a negative value, 
a zero is put in that position. 



After the path matrix is complete, the highest value on the surface of comparison represents the end of 
the best region of similarity between the sequences. The best path from this highest value backwards 
to the point where the values revert to zero is the alignment shown by BestFit This alignment is the 
best segment of similarity between the two sequences. 

For nucleic acids, the default scoring matrix has a match value of 1.0 for each identical symbol 
comparison and -0.90 for each non-identical comparison (not considering nucleotide ambiguity symbols 
for this example). The quality score for a nucleic acid alignment can, therefore, be determined using 
the following equation: 

Quality = 1.0 x TotalMatches + -0.90 x TotalMismatches 
- (GapCreationPenalty x GapNuraber) 

- (GapExtensionPenalty x Tot alLengthOf Gaps) 

The quality score for a protein alignment is calculated in a similar manner. However, while the default 
nucleic acid scoring matrix has a single value for all non-identical comparisons, the default protein 
scoring matrix has different values for the various non-identical amino acid comparisons. The quality 
score for a protein alignment can therefore be determined using the following equation (where Total 
is the total number of A-A (Ala-Ala) matches in the alignment, CmpVal is the value for an A-A 
comparison in the scoring matrix, Tcrai^ is. the total number of A-B *(Ala-Asx) matches in the 
alignment, Catpval^ is the value for an A-B comparison in the scoring matrix, ...) : 

Quali::v » CmpVal x Total 

AA AA 

+ CmpVal x Total 

A3 AS 

- CmpVal x Total m 



- 3mpVal„ x Totals 

- {GapCreaticnreaaity x GapNumber) 

" (3apSxte?.sior.?enalty x TotalLengthOfGaps) 



For a more complete discussion of scoring matrices, see the Data Files manual. 
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CONSIDERATIONS . 
BestRt Always Finds Something 

BestFit always finds an alignment for any two sequences you compare - even if there is no 
significant sim^knty between them! You must evaluate the results critically to decide the 
segment shown is not just a random region of relative similarity. 

The Segments Shown Obscure Alternative Segments 

BestFit only shows one segment of similarity, so if there are several, all but one is obscured You 
ELSTflt thl %P n ; blem Wth ^ hic matrix analysis (see the Compare and DotX 
programs). Alternatively you can run BestFit on ranges outside the ranges of similarity found 
in earlier runs to bring other segments out of the shadow of the best segment 

The Best Rt is Only One Member of a Family 

^™S St ?S? Pi ? g ^ orithm f ' *• Anient displayed is a member of the family of best 
SXr ^ S t y ?!f y 0t \ eT members rf ^ q^lity, but will not have any 
member with a higher quahty The family is usually significantly different for different chokes 
of gap creation and gap extension penalties. See the CONSIDERATIONS topic in the entry for 

£ P e^s^^tit 6 " to lea ™ »~ — to » 

The Surface of Comparison 

^t^?^ 6 / TS^* j ? 13 P 1 ** 01 * 011 * 1 to *e area of the surface of comparison. 
That area is determined by the product of the lengths of the two sequences compared. BestFit 
can evaluate a surface of up to 3.5 million elements. This surfacT would be large enoujh to 
SSK? S" '^^ 1 ?Sr U ! B 7 1 ' 870 -^ mbols or one sequence 200-symbok long 
SlT, r T* 17 ' 500 - § y mb f ls lon &- When you have much longer sequences that arf 
efficTntJy W ^ ^ ^ command - ]ine option -LiMit to use the surface moS 

The PubHc Scoring Matrix tor Nucleic Acid Comparisons is Very Stringent 

The scoring matrix swgapcmaxmp penalizes mismatches -0.9 so the segments found may be very 

J^i^S^ meanS ^ the .fg nment «™t be extended by three bases to pick oZ 
extra match. The scoring matrix used by Smith and Waterman, when local alignments were first 
describe* used -0.333 for the mismatch penalty. You can use Fetch to copy ranXmdna.m P ^5 
2 Sl.^ " CmP ^ ValU6S ' 0r 1136 ™sgapdna.cmp, which has no mismatch 

Rapid Alignment 

When possible, BestFit tries to find the optimal alignment very quickly. If this rapid alignment 
is not unammguously optimal, BestFit automatically realigns the sequences to calculate the 
optimal alignment When this occurs, the monitor of alignment progress on your terminal screen 
(..r-^gr.—g . . . ; is displayed twice for a single alignment. 

ALIGNING LONG SEQUENCES 

b^JT^Z^' ^ SeQUen f 1 S if y ° U ^ r0U ^ where alignment of interest 
begin,. Run the program with tne command line option -LXMit. Then set the starting coordinates for 

seQu e ;TThe orSrt ° f ^ S6t ^ limits ° n each 

' nTZ , f I *T n se ^es from your starting point such that the sequences do 

not get out of phase by more than the gap shift limits you have set. If you started both sequences at 



v . . % 
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SE^T ° Qe and S6t ?* P Sh ? for sequence one to 100 and for sequence tm to 50 then 
sequence ST*™* °°* ™* * gWd *° ^ baSe ° Utside ° f * e raD ^ e *** 300 to 

t i r L ^T t f ^ e ? a ™ ad ^ the program automatically sets gap shift limits if they are 

S25?£J2T*w T 1 -* 1 ^ S6quences to proceed - In 41115 case » the programHmlte total 
length of gaps that can be mserted into each sequence and calculates the best afgnment witnki thfs 

Z^;° r 1 - Zmi<ed ; SUlf ff ° f C ? m *T son - T>* P">gram then performs a calcuC^ete^mte 

co ^ d P° s ^ly be improved if there were no restriction on the total fengTof 
gaps a each sequence. If the program, cannot rule out this possibility, it displays Te m^sase 
7^«7 Bt « guaranteed , to/ be optimal Because the criteria usecTS 2e 

f0r f ^ ranteem ^ -j**™* alignment are very stringent, a limited ig^nent o^n nTayte 
optimal even rf thos message is displayed. In any event, the program continues to <Smpletion^ * 

EVALUATING ALIGNMENT SIGNIFICANCE 

^ITS T h 5y^ evaluate the significance of the alignment, using a simple statistical 
*£S • -™?°^i™ s <°*imand Hne option. The second sequence is repeal 

IKS' f t31 T g ltS f d position, and then realigned to the first sequence. The averS 
ahgnment score, plus or minus the standard deviation, of all randomized alignments is reported faS 
output file. You can compare tins average quality score to the quality score tfL «SJS2£ii T to 

wlS £e ^h* agn f • ^ ° f fT* 61 * ^ nnmber of randomizations can beTpecledS ng 
with the -RANdomizations command line qualifier; the default is 10. ^eaaiong 

A? ° f - 6a ? rmdo ? i2ed 4 _ aH g n f «t is reported to the screen. You can use <Ctri>c to interrupt 
Smp^d nS r6SUltS to *~ ^nments that W Ten 

*»£f £ ~ Monte Cario jj^ 

ALIGNMENT METRICS 

BestFit and Gap display four figures of merit for alignments: Quality, Ratio, Identity, and Similarity. 

2Stt^2T a S ,d ^ iS ,f 6 metric maxm ^ed in order to align the sequences. Ratio is the 
ESX^ZHJf ^ nXa ft T J 33563 * * e shorter Percent Identity is the percent of Sei 

Sls^tf^ ^ PerC6nt Similari,y 18 P««* of the symbols that are sinu£ 
S£ 1 StV? aCr ° SS PS are ^ ored - A ^ scored when the scoring matrix value for: 

bv Se t^tZPT^ ° r ? QUal 10 °- 50 ' 4116 Simaari » ™« toxoid is also used 

P^edure to decide when to put a ':' (colon) between two aligned symbols. You can reset 

iS^oT?"*!? 1 *^ * e , second ^Ptional parameter of -«i r . For iSSnce, the e^res?on 
-PAir=l .0,0.5 would set the similarity threshold to 0.5. p«»won 

I1<£^£i% nti » ^ "* M <"^^ <" w » *, A»U not be used 

PEPTIDE SEQUENCES 

If 'year »^ W»«e« « Peptide sequences, this program uses a scoring matrix with matches scored 
mJSJS- TX I !f° red "T*^* t0 * e evolutionary distance between the amino acids as 
i986 ay " n 6d ^ Glibsk0V ^ (Gribsk0V B ^ ess Nucl. Acids. Res. 14(16) 
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RESTRICTIONS - 



Input sequences may not be more than 30,000-symbols long. This program cannot evaluate a surfed 

nriTno 1 ^ 61 " ^ 5 ' 5 <3 mil ^ A ^ entS - A 2 0^27,500 ^sonTpo^e as ^2 a 
^.n± ?^° mP ^ SOn ' S ? LONG SEQUENCES topic for help in ahgnW W 

T** n £ 6XCeed ^ maximum surface of comparison. You can alsfSk you? 
memory ^ ° f Com P arison * your system has enough VSZ 

SEQUENCE TYPE 

SlTJ^S?? ^ d y^. 0n iif*r r ^ ^l"* «^Pi«M*tt) an proUin or naeteolld^ Normally 
the type of a sequence is determined by the presence of either Type • N or Tvoe • p on th» w iStS 

t^t? T ^l^rr ^ sequent) «»^'c^^ta£ 

Appendix VI for information on how to change or set the type of a sequence. 

COMMAND-LINE SUMMARY 

All parameters for this program may be put on the command line. Use the option -CHEck to see the 

nTZ^r Zf *° IT* 3 STi? 3dd 40 Mnmmd l^e before the 

In the summary below, the capitalized letters in the qualifier names are the letters that you m2rtn>e 
m order touse the parameter. Square brackets ([ and D enclose qualifiers or parameter vSu^Tte^e 

Minimal Syntax: % bestfit [-INfilel-] gamma, seq [-INfile2=] alu. seq -Default. 
Prompted Parameters: 

-BEGinl-1 -BEGin2=l beginning of each sequence 

-END1=500 -SND2=207 end of each sequence 

-NOREV1 -NOREV2 strand of each sequence 

"^^h gh -" 5 "°n , ^ Creation P enal ty (3.0 is protein default) 

7^n!^T • ^ extension P ea ^y (0.1 ia protein default) 

[-OUT^ilel=] gamma. pair . output file for alignment 

Local Data Files: -DATa=swga P dna . cmp scoring matrix for nucleic acids 

-DATa-swgappep.cmp scoring matrix for peptides 

Optional Parameters: 

^iSr^r ,> t seq r nce f H e £ « i 
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-OUTfile3=alu.gac 

-LIMitl=499 -LI.Mit2=205 limit the surface of comparison' 
-RANdomizationst-lDl determine average score from 10 randomized 

alignments 

Zl-lrlel' 0 ' 5 ' 0 ' 1 thresholds for displaying '|', ':', and r.' 

J™;! 0 the number of sequence symbols per line 

aads a line with a f °=» feed everv 60 lines 
-N^i.-v.-aps suppresses abbreviation of large gaos with '.'s 

I^2" r: ' a '- r ' a}ces rhe '°P aiigraaent for your oarameters 

*S** Z 3akes the bct: ' CEl alignment for your parameters 

-K'-iuMnary suppresses the screen summarv 



% 
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LOCAL DATA FILES 



The files described below supply auxiliary data to this program. The program automatically reads 
them from a public data directory unless you either 1) have a data file with exactly the same name in 
your current working directory; or 2) name a file on the command line with an expression like 
-DATal=nryfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide. 

If the first sequence you name is a nucleic acid, BestFit uses the scoring matrix in the public file 
swgapdna.cmp. (SW stands for Smith and Waterman.) If the first sequence you name is a peptide 
sequence, BestFit reads swgappep.cmp instead. The presence of these files in your current working 
directory causes BestFit to read your version instead. (See the Data Files manual for more 
information about scoring matrices.) 

OPTIONAL PARAMETERS 



The parameters and switches listed below can be set from the command line. For more information, 
see Using Program Parameters" in Chapter 3, Basic Concepts: Using Programs in the User's Guide. 

-LIMitl=20 and -LlMit2=20 

let you set gap shift limits for each sequence. When you already know of a long similaritv 
between two sequences you can "zip" them together using this mode. The beginning coordinates 
for each sequence must be near the beginning of the alignment you want to see. The alignment 
continues so that gaps inserted do not require the sequences to get out of step by more than the 
gap shift limits. You can align very long sequences rapidly. The surface of comparison is still- 
limited to 3.5 million. The size of a comparison can be predicted by multiplying the average 
length of the two sequences by the sum of the two shift limits. 

If you add -unit to the command line without any qualifier value, the program prompts you to 
enter gap shift limits for each sequence. 

— RRNcomi 2 z~ i o us = 1 0 

reports the average alignment score and standard deviation from 10 randomized alignments in 
which the second sequence is repeatedly shuffled, maintaining the length and composition of the 
original sequence, and then aligned to the first sequence. You can use the optional parameter to 
set the number of randomized alignment to some number other than 10. 

-OUTf 1 1 e2=seqnamel . gap -OtJTf iie3=seqnaxne2 . gap 

This Droeram can write three different output files. The first displays the alignment of sequence 
one wren sequence two. The second is a new sequence file for sequence one. possibly expanded bv 
gaps to make it align with sequence two. The third, like the second, is a new sequence file for 
sequence two, possibly expanded by gaps to make it align with sequence one. The program 
writes only die first file unless there are output file options on the command line. If there are 
any output files named on the command line, only those output files are written. If you add 
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-OUT to the command line without any qualifying filename, then the program will write the 
second and third output files after prompting you for their names. 

Aligned sequences (in sequence files) can be displayed with GapShow. Their similarity can be 
displayed with PlotSimilarity. 

-PAIr=1.0,0.5,0.1 

The paired output file from this program displays sequence similarity by printing one of three 
characters between similar sequence symbols: a pipe character(l ), a colon (:), or a period () 
Normally a pipe character is put between symbols that are the same, a colon is put between 
symbols whose comparison value is greater than or equal to 0.50, and a period is put between 
symbols whose comparison value is greater than or equal to 0.10. You can change these match 
display thresholds from the command line. The three parameters for -PAir are the display 
thresholds for the pipe character, colon, and period. The match display criterion for a pipe 
character changes from symbolic identity (the default) to the quantitative threshold you have set 
in the first parameter. A pipe character will no longer be inserted between identical symbols 
unless their comparison values are greater than or equal to this threshold. If you still want a 
pipe character to connect identical symbols, use x instead of a number as the first parameter. 
(See the Data Files manual for more information about scoring matrices.) 



-PAGe=64 



When you print the output from this program, it may cross from one page to another in a 
frustrating way - especially when you print on individual sheets. This option adds form feeds to 
the output file in order to try to keep clusters of related information together. You can set the 
number of lines per page by supplying a number after the -PAGe qualifier. 



-WIDth=50 



puts 50 sequence symbols on each line of the output file. You can set the width to anything from 
10 to 150 symbols. B 

-NOBIGGaps 

suppresses large gap abbreviations, showing all the sequence characters across from large gaps. 
Usually, gaps that extend one sequence by more than one complete line of output are abbreviated 
with three dots arranged in a vertical line. 

-LOWroad and -HIGhroad 

The insertion of gaps is, in many cases, arbitrary, and equally optimal alignments can be 
generated by inserting gaps differently. When equally optimal alignments are possible, this 
program can insert the gaps differently if you select either the -LOWroad or the -HIGhroad 
options. Here are examples for the alignment of GACCAT with GACAT with different 
parameters. 

Tor: Match =1.0 MisMatch = -0.9 

•Sac weight = 1.0 Length Weight = 0 .0 



lew?.; si: 1 SACCAT 6 

; u 

- SA.CAT 5 

Hi'rr.Rcad: 1 GACCAT 6 
• I ! | I 
1 GACAT 5 



Quaii-v =4.0 



Quality =4.0 



: A . V * e mparison 




Bestfit 5-27 



For: Match » 1.0 

Gap weight ==3.0 

HighRoad: 1 GACCAT 6 
I I I 

1 GACAT. 5 



MisMatch 
Length Weight 



Quality =3.0 



0.0 
0.0 



LowRoad: 1 GACCAT 6 



.III Quality =3-0 

1 .GACAT 5 . 



Essentially the low road shafts all of the arbitnuy gaps in sequence two to the left and all of the 
artery gaps in sequence one to the right. Th* high road does exactly the opposite. Wheh ne ther 

^ " P r0gram «" «* to * sert a whenever that is poSiSe ^ 

uses the high road alternative for all collisions. 



-SUMmary 



s T max y rf *e Program's work to the screen when you've used the -Default qualifier to 

faSSLf S 0 "™ A Summar * disP^ys at the end of a program run 

interactively. You can suppress the summary for a program run interactively with -NOSTHfoiary. 

T^ethis qualifier also to include a summary of the program's work in the log file for a program run in 
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